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Abstract 

We investigate the sparse recovery problem of reconstructing a high-dimensional non-negative 
sparse vector from lower dimensional linear measurements. While much work has focused on 
dense measurement matrices, sparse measurement schemes are crucial in applications, such as 
DNA microarrays and sensor networks, where dense measurements are not practically feasible. 
One possible construction uses the adjacency matrices of expander graphs, which often leads to 
recovery algorithms much more efficient than £i minimization. However, to date, constructions 
based on expanders have required very high expansion coefficients which can potentially make 
the construction of such graphs difficult and the size of the recoverable sets small. 

In this paper, we construct sparse measurement matrices for the recovery of non-negative 
vectors, using perturbations of the adjacency matrix of an expander graph with much smaller 
expansion coefficient. We present a necessary and sufficient condition for £i optimization to 
successfully recover the unknown vector and obtain expressions for the recovery threshold. For 
certain classes of measurement matrices, this necessary and sufficient condition is further equiv- 
alent to the existence of a "unique" vector in the constraint set, which opens the door to 
alternative algorithms to £i minimization. We further show that the minimal expansion we use 
is necessary for any graph for which sparse recovery is possible and that therefore our construc- 
tion is tight. We finally present a novel recovery algorithm that exploits expansion and is much 
faster than £i optimization. Finally, we demonstrate through theoretical bounds, as well as 
simulation, that our method is robust to noise and approximate sparsity. 



1 Introduction 



We investigate the problem of signal recovery in compressed sensing, i.e., the problem of recon- 
structing a signal x that is assumed to be k sparse using m measurements, y = Ax, where m 
is smaller than the ambient dimension of the signal n, but larger than k. A here is the m x n 
so-called measurement matrix. In this paper, we focus on the case where the nonzero entries of x 
are positive, a special case that is of great practical interest. 

In compressed sensing, A is often a dense matrix drawn from some ensemble of random matrices 
(see, e.g., [3]). In this paper, however, we will focus on sparse measurement matrices. This 
is important for numerous reasons. In several applications, like DNA micro arrays, the cost of 
each measurement increases with the number of coordinates of x involved [TBI I28j . Also, sparse 
measurement matrices often make possible the design of faster decoding algorithms (e.g., [11^ 
[71 [HI [19]) apart from the general linear programming decoder [3]. In addition, unlike random 
measurement matrices (such as Gaussian or Bernoulli) , which only guarantee the recovery of sparse 
vectors with high probability, expander graphs give deterministic guarantees (see, e.g., |11] . which 
gives a deterministic guarantee for the fast algorithm proposed, and [6j for concentration lemmas 
on expander graphs). 

Unlike Gaussian matrices, where reasonably sharp bounds on the thresholds which guarantee 
linear programming to recover sparse signals have been obtained [2j, such sharp bounds do not 
exist for expander-graph-based measurements. This is the main focus of the current paper, for the 
special case where the fc-sparse vector is non-negative. 

It turns out that, due to the additional non-negativity constraint, one requires significantly 
fewer measurements to recover A;-sparse non-negative signals. The non-negative case has also been 
studied in [5] for Gaussian matrices and also in the work of Bruckstein et al. [10], which further 
proposes a "matching pursuit" type of recovery algorithm. See also [29] for another example. 

The success of a measurement matrix is often certified by a so-called Restricted Isometry Prop- 
erty (RIP) which guarantees the success of ii minimization. Recently, Berinde et al. ^ showed 
that the adjacency matrices of suitable unbalanced expander graphs satisfy an RIP property for 
ipr^i norm. However, it turns out that RIP conditions are only sufficient and often fail to charac- 
terize all good measurement matrices. A complete characterization of good measurement matrices 
was recently given in terms of their null space. More precisely, as stated in previous work (e.g. 
[T71 [20l [221 [21]), if for any vector w in the null space of A, the sum of the absolute values of any k 
elements of w is less that the sum of the absolute values of the rest of the elements, then the solu- 
tion to min ||x||o subject to Ax = y can always be obtained by solving min ||x||i subject to Ax = y, 
provided x is fc-sparseJl] This condition is stated in the work of Donoho [1] as the fc-neighborly 
polytope property of A, and in the work of Candes et al. as the uncertainty principle [3]. Donoho 
et al. also have been able to show the validity of this condition with high probability for random 
i.i.d Gaussian matrices and are therefore able to compute fairly tight thresholds on when linear- 
programming-based compressed sensing works [2J . The first analysis of the null space for expander 
graphs has been done by Indyk [9j, where it was shown that every (2fc,e) expander graplu with 
e < g will have a well supported null space. See also [18] for explicit constructions using expander 
graphs. 

Furthermore, using Theorem 1 of [14], which is a generalization of the null space property 

^Here || • ||o represents the number of non-zero entries in its argument vector and || ■ ||i is the standard ^i-norm. 
■^We shaU formally define (fc, e) expander graphs shrotly. 
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theorem for the recovery of approximately sparse signals, Indyk's result gives an upper bound on 
the error when linear programming is used to recover approximately /c-sparse vectors using expander 
graph measurements. 

Contributions of the current paper. We present a necessary and sufficient condition that 
completely characterizes the success of ^i-minimization for non-negative signals, similar to the null 
space condition for the general case. Our condition requires that all the vectors in the null space 
of A have sufficiently large negative support (i.e. a large number of negative entries). It further 
turns out that, for a certain class of measurement matrices A, this condition is nothing but the 
condition for the existence of a "unique" vector in the constraint set {x|Ax = y,x > 0}. This 
therefore suggests that any other convex optimization problem over this constraint set can find the 
solution. (We exploit this fact later to find faster alternatives to £i minimization.) 

We then use the necessary and sufficient characterization to construct sparse measurement 
matrices. Our construction relies on starting with the adjacency matrix of an unbalanced expander 
(with constant degree) and adding suitable small perturbations to the non-zero entries. 

Several sparse matrix constructions rely on adjacency matrices of expander graphs [6t [9| [T H [T3 | 
I15j . In these works, the technical arguments require very large expansion coefficients, in particular, 

1 — e > 3/4, in order to guarantee a large number of unique neighbors [23] to the expanding sets. 
A critical innovation of our work is that we require much smaller expansion, namely, 1 — e = 1/d, 
where d is the number of non-zero entries in every column of A. In fact, we show that expansion 
of 1 — e = ^ is necessary for any matrix that works for compressed sensing. These two results 
show that for nonnegative vectors, expansion of 1 — e = ^ is necessary and sufficient, and the small 
expansion requirement allows a much larger set of recoverable signals. 

The reason for this different requirement is that we use expansion in a different way than 
previous work; we do not require a unique neighbor property but rather rely on Hall's theorem 
that guarantees that 1 — e = 1/d expansion will guarantee perfect matchings for expanding sets. 
The matching combined with perturbations in the entries guarantees full rank sub-matrices which 
in turn translates to the null space characterization we need. 

Finally, we propose a fast alternative to ii optimization for recovering the unknown x. The 
method first identifies a large portion of the unknown vector x where the entries are zero, and 
then solves an "overdetermined" system of linear equations to determine the remaining unknown 
components of x. Simulations are given to present the efficacy of the method and its robustness to 
noise. 

2 Problem Formulation 

The goal in compressed sensing is to recover a sparse vector from a set of under-determined linear 
equations. In many real world applications the original data vector is nonnegative, which is the case 
that we will focus on in this paper. The original problem of compressed sensing for the nonnegative 
input vectors is the following: 



where A*"^" is the measurement matrix, y"^^^ is called the observation vector, x"^ is the unknown 
vector which is known to be /c-sparse, i.e., to have only k nonzero entries, and where || • ||o is Iq 
norm, i.e., the number of nonzero entries of a given vector. The typical situation is that n > m > k. 
Although ([T]) is an NP-hard problem, Donoho and Tanner have shown in ^ that, for a class of 
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matrices A maintaining a so-called outwardly A;-neighborly property and x being at most fc-sparse, 
the solution to ([T]) is unique and can be recovered via the following linear programming problem: 



mm 

Ax=y,x>0 



X||l 



(2) 



They also show that i.i.d Gaussian random m x n matrices with m = n/2 are outwardly m/8- 
neighborly with high probability, and thus allow the recovery of n/16 sparse vectors x via linear 
programming. They further define a weak neighborly notion, based upon which they show that the 
same Gaussian random matrices will allow the recovery of almost all 0.279n sparse vectors x via 
•^i-optimization for sufficiently large n. 

In this paper, we primarily seek the answer to a similar question when the measurement matrix 
A is sparse and, in particular when A is the adjacency matrix of an unbalanced bipartite graph 
with constant left degree d. The aim is to analyze the outwardly neighborly conditions for this class 
of matrices and come up with sparse structures that allow the recovery of vectors with sparsity 
proportional to the number of equations. 

3 Null Space Characterization and Complete Rank 

We begin by stating an equivalent version of the outwardly neighborly condition which is in fact 
similar to the null space property that was mentioned in the introduction, but for the non-negative 
case. Later we show that this has a much more mundane interpretation for the special case of 
regular bipartite graphs. We present the first theorem in the same style as in 0. 

Theorem 3.1. Let A he a nonnegative m x n matrix and k < n/2 be a positive integer. The 
following two statements are equivalent: 

1. For every nonnegative vector xq with at most k nonzero entries, xq is the unique solution to 
^ with y = Axo . 

2. For every vector w / in the null space of A, and every index set S C {1,2, ...,n} with 
\S\ = k such that w^c > 0, it holds that 



Here S'^ is the complement set of S in {l,2,...,n} and ws denotes the sub-vector of w constructed 
by those elements indexed in S. \S\ means the cardinality of the the set S 

Theorem 13.11 is in fact the counterpart of Theorem 1 of [17J for nonnegative vectors. It gives 
a necessary and sufficient condition on the matrix A, such that all A:-sparse vectors xq can be 
recovered using The condition is essentially that for every vector in the null space of A, the 
sum of every n — k nonnegative elements should be greater than the absolute sum of the rest. 
(This is very similar, but not identical, to the null space property of [Ej.) Therefore we call it the 
non-negative null space property. 
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Proof. Suppose A has the non-negative nuh space property. We assume xq is /c-sparse and show 
that under the mentioned nuh space condition, the solution to ([2]) produces xq. We denote by xi 
the solution to ([2|). Let S be the support set of xq. We can write: 



|xi||i = ||xo + (xi - Xo)||l 

n 



i=l 



|xo||i + X] ~ 



(3) 
(4) 



i=l 



Where xo(z) and (xi — xO)(i) are the ith entry of xq and xi — xq respectively. The reason ([3]) and 
([4]) are true is that xi and xq are both nonnegative vectors and their £i-norm is simply the sum 
of their entries. Now, if xi and xq are not equal, since xi — xq is in the null space of A and is 
non- negative on Sc (because 5 is the support set of xq) we can write: 



i=l 



(xi -xo)(i) > 



(5) 



which implies 



|xi||i > ||xo||i 



But we know that ||xi||i < ||xo||i from the construction. This means that we should have xi = xq. 
Conversely, suppose there is a vector w in the null space of A and a subset S C {1, 2, n} of size 



k with w^c > and 



< 0. We construct a non-negative vector xq supported on S, and 



show that there exist another nonnegative vector xi ^ xq such that Axq = Axi and ||xi|| < ||xo||. 
This means that xq is not the unique solution to ([2]) with y = Axq and will complete the proof. 
For simplicity we may assume S = {1,2, k}. We construct a nonnegative vector xq supported 
on 5 that cannot be recovered via £i -minimization of ([2]). Wlog we write 



(6) 
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nonnegative vectors. Now set 
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(7) 



In this paper we will be considering measurement matrices A which possess the following two 
features: 1. the entries are non-negative and 2. the sum of the columns are constant. This class of 
matrices includes measurement matrices obtained from the adjacency matrix of regular left degree 
bipartite graphs (which have a constant number of ones in each column) , as well as their perturbed 
versions introduced in section [331 For this class of matrices we actually show that the condition for 
the success of ii recovery is simply the condition for there being a "unique" vector in the constraint 
set {x|Ax = Axo,x > 0}. To this end, we prove the following theorem 
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Theorem 3.2. Let A G 7^»"X" a matrix with non-negative entries and constant column sum. 
Then the following three statements are equivalent. 

1. For all non-negative k-sparse xq, it holds that 

{x|Ax = Axo,x > 0} = {xo}. 

2. For every vector w ^ in the null space of A, and every index set S C {1,2, ...,n} with 
\S\ = k such that w^c > 0, it holds that 

n 

^Wi>0. 

1=1 

3. For every subset S C {1, 2, n} with \S\ = k, there exists no vector w ^ in the null space 
of A such that w^c > 0. 



Theorems 13.11 and 13 . 21 show that for the class of matrices with non- negative entries and constant 
column sum, the condition for the success of ii recovery is simply the condition for there being a 
"unique" vector in the constraint set {x| Ax = Axq, x > 0}. In this case, any optimization problem, 
e.g., minx>o,Ax=y ||x||2, would also recover the desired xq. 

In fact, rather than prove Theorem 13.21 we shall prove the fohowing stronger resuh (from which 
Theorem 13.21 readily follows) . 

Lemma 3.1. Let A G 7^™x" 5e a matrix with non-negative entries and constant column sum. Then 
the following two statements are equivalent. 

1. For all non-negative xq whose support is S, it holds that 

{x|Ax = Axo,x > 0} = {xo}. 

2. For every vector w in the null space of A such that W5C > 0, it holds that 



^Wi>0. 



1=1 



3. There exists no vector w 7^ m the null space of A such that w^c > 0. 

Proof. First, we show that for any nonnegative matrix A, statements 1 and 3 of Lemma 13.11 is 
equivalent, suppose that the condition 3 holds for a specific subset S C {1,2,-- - ,n}. Consider a 
nonnegative n x 1 vector xq supported on S. If there exists another nonnegative vector xi with 
the property that Axi = Axq, then xi — xq would be a vector in the null space of A which is also 
nonnegative on S^, due to the nonnegativity of xi and the fact that S is the support set of xq. 
This contradicts the earlier assumption of condition 2. 

The proof of converse is also straight forward. Suppose the condition 1 holds for a specific 
subset S and all nonnegative vectors xq supported on S. Let's say one can find a nonzero vector 
w in the null space of A with W5C > . As in the proof of Theorem 13. 1^ we may write w as 



w = 



-ws- 



(8) 
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where w^- and W5+ are both nonnegative vectors. Now if 





W5- 














W5+ 












then xq and xi are distinct nonzero vectors and belong to the set {x|Ax = Axo,x > 0}. This is 
a contradiction to the assumption we earher made. 

So far we have shown that for any nonnegative matrix A the two statements 1 and 3 are 
equivalent. Now we show that for matrices with constant column sum the two statements 2 and 
3 are equivalent. We make use of Lemma 13.21 in Section [3.11 that for this special class of matrices 
with constant column sum, every vector in the null space has a zero sum of entries. Therefore, 
statement 2 can be true only if there is no w in the null space of A with W5C > 0. Conversely if 
the condition in statement 3 holds, then there is no w G AA(A) \ {0} such that W5C is nonnegative 
and therefore statement 2 is also true. 



3.1 Null Space of Adjacency Matrices 

As promised earlier, we will now assume that A is the adjacency matrix of a bipartite graph with 
n nodes on the left and m nodes on the right. We also assume that the graph is left d-regular. In 
other words A is a (m x n) matrix with exactly d ones in each column. We will now give a series of 
results for such matrices. However, we should note that, unless stated otherwise, all these results 
continue to hold for the class of matrices with non-negative entries and constant column sum (the 
interested reader should be able to easily verify this). 

Lemma 3.2. Let A™^" be a matrix with nonnegative entries and constant column sum. For any 
vector w in the null space of A, the following is true 

n 

Y,^^ = ^ (10) 

i=l 

Proof. Let 1 = [1, 1, 1] be the m x 1 vector of all I's. We have: 

n 

Aw = ^ lAw = ^ d^Wi = (11) 

1=1 



Theorem 3.3. For any matrix A™^" with exactly d I's in each column and other entries zero, 
the following two statements are equivalent: 

• Every nonnegative vector xq with at most k nonzero entries is the unique solution to with 
y = Axq. 

• Every vector w in the null space of A has at least k + I negative entries. 

Proof. We only need to show that for any w G AA(A) /{O} the second statements of Theorem 13.11 
and Theorem 13.31 are equivalent. Let's assume there exists a nonzero w G AA(A) with at most 
k negative entries. We use S^, S~ and to denote the support of positive, negative and zero 
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entries of w respectively. By Lemma [3121 X^ILi ^« ~ ^- Therefore if S" = 5"+ U S^, then |5| < k 
and the nonnegative null space property is not satisfied (for the set S). 

The other direction is straightforward. If any w S AA(A) has k + 1 negatives, there is no choice 
for S C {l,2,...,n} of size k, with w^c > and the nonnegative null space property is satisfied 
already. ■ 

These results show how the structure of the null space of the measurement matrix is related 
to the recoverability of sparse vectors. Thus to achieve our primary goal, which is constructing 
optimal sparse measurement matrices, we need to find bipartite graphs with non-negative null 
space properties up to a maximal sparsity (hopefully, proportional to the dimension n). One 
promising choice would be the use of the adjacency matrix of expander graphs. However, rather than 
restrict ourselves to this choice, we present some theorems paraphrasing the null-space property 
and interpreting it in terms of other properties of matrices. This way we show that at some point 
expander graphs inevitably emerge as the best choice, and even further as a necessary condition 
for any measurement matrix. 

3.2 Complete Rank and Natural Expansion 

Before proceeding, let us consider the following two definitions, whose relation to the main topic 
will be shortly made apparent. 

Definition 1. For a matrix A™^" we define the Complete Rank of A (denoted by Cr(A)) to be 
the maximum integer tq with the property that every rg columns of A are linearly independent. In 
other words, Cr(A) = minwg^(A)(|'S'^PP(w)| — 1), where by Supp{w) we mean the support set of 
w. 

This notion is also known in linear algebra as Kruskal rank (see |27j). 

Definition 2. A left regular bipartite graph (X,Y ,d) with X and Y the set of left and right vertices 
(\X\ = n,\Y\ = m) and d the regular left degree is called a (k,e) -unbalanced expander if for every 
S C X with \S\ < k, the following holds: |r(S')| > kd{l — e), where T{S) is the set of neighbors of 
S in Y . 

The following lemma connects these two notions: 

Lemma 3.3. Every bipartite graph with adjacency matrix A and left degree d is a (Cr{A),^^) 
expander. 

Proof. If S C X with |5| < Cr(A) then the columns of A corresponding to the elements of S are 
linearly independent. So the sub-matrix of A produced by only those columns which correspond 
to 5 must be of full rank. Therefore, it must have at least |5| non-zeros rows, which is to say 

\T{S)\>\s\ = \s\d{i-^). m 

A direct corollary of this theorem is that: 

VS C X, \T{S)\ > min(|S|,Cr(A)) (12) 

The notion of complete rank is tightly related to the expansion property. It is also related to the 
null space characterization we are shooting for. The following theorem sheds some light on this 
fact. 
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Theorem 3.4. A"^^" is the adjacency matrix of a left d-regular bipartite graph, then for every 
nonzero vector w in the null space of A. the number of negative elements of w is at least ^— 3— ^• 

Proof. Let X and Y be the sets of left and right vertices of the bipartite graph corresponding to A 
{X corresponds to columns of A), let be the set of vertices in X corresponding to the positive 
elements of w, and likewise S~ be set of vertices corresponding to the negative elements i Let 
'S'w = 'S'w U 5^. The equation Ax = y can be manifested on the graph representation of A with 
each node of Y corresponding to an equation with zero R.H.S. This entails r(5'+) = r(S'^) = r(S'w), 
since otherwise, there exists a vertex in Y connected to exactly one of the sets or S^, and its 
corresponding equation will not sum up to zero. On the other hand, from the definition of Cr(A), 
we must have |5w| > Cr(A). The number of edges emanating from S~ is dlS"!, which is at least 
as large as the number of its neighbors |r(5'~)|. Then: 



d\s-\>\r{s-)\ = \r{s^)\>CriA) 

Where the last inequality is a consequence of (fT2]) . ■ 

We now turn to the task of constructing adjacency matrices with complete rank proportional 
to dimension. Throughout this paper, all the thresholds that we achieve are asymptotic, i.e., they 
hold for the regime of very large n and m. 

3.3 Perturbed Expanders 

When n and m = f3n are large, we are interested in constructing 0-1 matrices A"*^" with d 
(constant) I's in each column such that Cr(A) is proportional to n. Furthermore, the maximum 
achievable value of ^-^^ is critical. This is a very difficult question to address. However, it turns 
out to be much easier if we allow for a small perturbation of the nonzero entries of A, as shown 
next. 

Lemma 3.4. For a matrix A G ]^mx" which is the adjacency matrix of a bipartite left d-regular 
graph, if the submatrix formed by any tq columns of A has at least tq nonzero rows, then it is 
possible to perturb the nonzero entries of A and obtain another nonnegative matrix A through this 
procedure, with Cr(A) > tq. Furthermore, the perturbations can be done in a way that the sum of 
each column remains a constant d. 

Proof. The proof is based on showing that the set of valid perturbations that do not guarantee 
Cr(A) > ro is measure zero. So, by choosing perturbations uniformly from the set of valid pertur- 
bations, with probability one we will have Cr{A) > rg. ■ 

Remark 1. It is worth mentioning that in a more practical scenario, what we really need is that 
every rg columns 0/ A = A -|- E be not only nonsingular, but "sufficiently" nonsingular. In other 
words, if the minimum singular value of the submatrix formed by any rg columns of A is greater than 
a constant number c (which does not depend on n), then that is what we recognize as the RIP — 2 
condition for A. This condition then guarantees that the solution to ii-minimization is robust, in 
an i2-norm sense, to the noise (please see U/ for more details on this issue). Besides, in order to 

^We interchangeably use S and its variations to denote a set of vertices or a support set of a vector. 
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have a recovery algorithm which can be implemented with reasonable precision, it is important that 
we have such a well- conditioned statement on the measurement matrix. We will not delve into the 
details of this issue here. However, we conjecture that by leveraging ideas from perturbation theory, 
it is possible to show that A maintains some sort RIP — 2 condition. Particularly, it has been 
shown in ^26] that random dense perturbations A = A + E will force the small singular values to 
increase, and the increment is proportional to ^/m. In our case where E zs a sparse matrix itself, 
we conjecture that the increment must be 0{\). 

A corresponds to the same bipartite graph as A. However, the edges are labeled with positive 
fractional weights between and 1, rather than single 1 weights of A . Besides, all the edges 
emanating from any node in X have a weight sum-up equal to d. It is worth noticing that, after 
modifying A based on perturbations described above. Theorem 13. 11 Lemma 13.21 and Theorems 13.31 
and 13.41 all continue to hold for this class of matrices A. Therefore Cr(A) > rg will guarantee 
perfect recovery of ^-sparse vectors via ^i-minimization. Moreover, the fact that Cr(A) > tq can 
be translated back as if A is a (^^Oi^x") unbalanced expander graph. Therefore what we really 
care about is constructing {tq,^-^) expanders with ^ as large as possible. In section HJ we use a 
probabilistic method to show that the desired (rg = /in,^^) expanders exist and give thresholds on 
^. Before continuing, note that we are using a 1 — e = ^ expansion coefficient for perfect recovery, 
which is very small compared to other schemes that use expanders (see, e.g., [HI [U El 13 dH [13]) 
and require expansion coefficients at least larger than 1 — e>|. 1 — e = ^is indeed the critical 
expansion coefficient. We shortly digress in a subsection to discuss this a little further. 

3.4 Necessity of Expansion 

Consider any m x n measurement matrix A that allows the recovery of all rg-sparse vectors and 
construct its corresponding bipartite graph B{n,m) by placing an edge between nodes i,j if the 
Aij entry is nonzero. We show that any such matrix must correspond to an expander bipartite 
graph. The intuition is that a contracting set is certificate for a submatrix being rank-deficient, 
and hence reconstruction is impossible. In particular, for the case where each column of A has d 
non-zero entries we obtain the following statement: 

Lemma 3.5. Any m x n measurement matrix A with d non-zeros per column that allows the 
recovery of all rQ-sparse vectors must correspond to a (Vo,l — 2) bipartite expander graph. 

Proof. The statement holds for any recovery algorithm (even Iq minimization), because we show 
that even a stronger recovery algorithm that is given the support of the ro sparse vector will fail 
to recover. Assume that the bipartite graph is not a (ro,l — ^) expander, i.e. there exists a set 
of r < ro columns that is adjacent to r — 1 (or less) rows. Therefore the submatrix corresponding 
to these r columns must have rank strictly smaller than r regardless of what the non-zero entries 
are. By selecting a sparse signal that is supported exactly on these r columns we see that it is 
impossible for any algorithm to recover it, even if the support is known, since there is rank-loss in 
the corresponding measurement submatrix. 

This argument is easily extended to the non-regular case where the number of non-zeros in 
every column is arbitrary. The key necessary condition is that every set of size r < r^ has a 
neighborhood of size r or greater. This is exactly the condition of Hall's marriage theorem that 
guarantees a perfect matching for every subset of size up to rg. This perfect matching combined 
with perturbations will suffice to ensure that all the submatrices are full rank. ■ 
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Gaussian and other dense measurement matrices correspond to completely connected bipartite 
graphs which obviously have the necessary expansion. Therefore, the previous lemma becomes 
interesting when one tries to construct sparse measurement matrices. 

We can easily show that bigger expansion is also sufficient (but not necessary); given the ad- 
jacency matrix of a general (/c,e)-expander, can we guarantee that with appropriate perturbations, 
^i-optimization will allow the recovery of /(A:, e)-sparse vectors for some positive function /(.,.). 
The answer is yes, and the proof leverages on the fact that every (A:,e) graph can be interpreted as 
a {k',1 — ^)-expander for some k' > k. 

Lemma 3.6. If e < ^ ~ 2 ^''^^ k > 0, then every d-left regular (k,e)- expander is also a (k{l — 
e)d,l — 2)-expander 

Proof. Consider a subset 5 C X of the nodes with IS"] < ^(1 — e)d. If IS*] < k then from the 
(/c,e)-expansion |r(5)| > (1 — e)d|5'| > \S\. li k < \S\ < k{l — e)d, then by choosing an arbitrary 
subset S' C S with = k and using the (A:,e)-expansion on S' we have: 

|r(5)| > \r{S')\ > k{l-e)d> \S\. 

m 

Recall that a (ro,l — ^)-expander, when added with perturbations was capable of recovering tq- 
sparse vectors. This and Lemma [3]6] immediately imply: 

Theorem 3.5. If A is the adjacency matrix of an unbalanced d-left regular (k,e)- expander graph, 
then there exists a perturbation of A in the nonzero entries resulting in a nonnegative matrix A, 
such that every nonnegative k{l — e) -sparse vector x can be recovered from y = Ax without error 
using ii- optimization. 

This is an improvement over the existing bounds. For example, p], guarantees sparse | vectors 
can be reconstructed via ^i-optimization using {k,e) expanders with e < |- Using the above theorem 
with e = g, |A;-sparse non-negative vectors can be perfectly recovered by linear programming. 
Likewise [11] provides an algorithm that recovers vectors with sparsity | using {k,e) expander 
graphs with e < ^. Our theorem for e = ^ allows for the recovery of |A:-sparse positive signals. 
Note that these bounds are very conservative and in fact, the size of expanding sets when smaller 
expansion is required are much bigger, which yields much bigger provably recoverable sets. 



4 Existence of Sparse Matrices with Linear Complete Rank 

For fixed values of n > m > tq and d we are interested in the question of the existence of (ro,e = 
expanders with constant left degree d. We use the standard first moment method argument to prove 
the existence of such an expander for appropriate ratios of n,m and tq and d. The main result is 
given below, while the complete proof can be found in Appendix lAl 

Theorem 4.1. For sufficiently large n, with m = (3n and tq = ^n, there exists a bipartite graph 
with left vertex size n and right size m which is a (ro, ) expander, if 



H {t,)+(3H{ ^) 
/^log(f) 



d> . .p. - (13) 
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More important is the question of how big the ratio ^ can be, since we earher proved that we 
can recover up to ^ = sparse vectors using this method. Figure [la] illustrates the maximum 
achievable ratio for different values of /3 derived from p3|) . 



4.1 Weak bound 

We are now interested in deriving conditions for recovering a specific support set S of size k = an, 
rather than obtain a worst case bound for matrices that work for all support sets. Recall that 
m = (3n, left degree is d, and define 71 := (1 — e 1^ )(3. 

Theorem 4.2. Define the function 

F{p,,p2) := aH{^) + (1 - a)H{-^) + + d{p, + p^) log(^^). (14) 

a 1 — Q p p 

For every a such that F{pi, P2) < for every pi, p2 that satisfies pi < a, p2 < I — a, pi + p2 < 71, 
a randomly selected subset of size k = an is recoverable from a random perturbed matrix A with 
probability 1 — o(l). 

The bound that results from Theorem 14.21 is plotted in Figure [lb] and has been compared to 
the strong threshold previously achieved. The full proof of this statement is given in Appendix [Bl 
The key argument is a matching condition for the recoverability of vectors supported on a specific 
subset S. The condition involves looking at the two-hop graph from S and checking if all sets of 
size up to |r(5)| + 1 have a perfect matching: 

Lemma 4.1. Given a set S consider T{S) and denote S2 = r(r(S')) \ S. Let the bipartite two-hop 
graph of S be denoted by Bs = (5 U S2,T{S U S2))- Any non-negative vector xq supported on S 
can be recovered from y = Axq, if every subset S' C S L) S2 of size \S'\ < |r(5)| + 1 has minimal 
expansion: |r(S")| > \S'\. 

Proof. Consider the two-hop bipartite graph of S and let C = {S U denote the remainder of 
the nodes in X. Further let A5 denote the submatrix corresponding to Bs- By Hall's theorem 
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since every subset of Bs of size up to |r(S')| + 1 has expansion equal to its size, it must also have 
a perfect matching and by the same perturbation argument Cr(As) > |r(5')|. 

By our null space characterization, to show that a set S can be recovered, it suffices to show 
that every nonzero vector w in the nullspace of A cannot have all its negative components in S. 
Assume otherwise: that some w has indeed all its negative support Q S. Observe now that 
C cannot contain any of the positive support of w, because every equation that is adjacent to a 
positive element must also be adjacent to a negative (since the matrix coefficients are positive) and 
r(S'~) does not intersect r(C). Therefore the whole support of w must be contained in 5" U 82- 

Now we can show that |5w| < |r('S')|- Assume otherwise, that \Smv\ > |r(S')|. Then we could 
select a subset of K C. S^, such that \K\ = |r(S')| + 1. This set K satisfies our assumption and 
is contained in Bs and therefore must have the minimal expansion |r(ir)| > \K\ = |r(S')| + 1. 
But since T{S-) = T{S+) = T{S^) and K C C S, it must hold that |r(i^)| < \r{S)\, which 
contradicts the minimal expansion inequality. 

Therefore, \S-w\ must have a perfect matching which means that we can find a full rank submatrix 
Aw (corresponding to that matching) such that A^-ws = (where by W5 we denote the vector w 
restricted to its support). Since A^ is full rank, w must be the all zeros vector which contradicts 
the assumption that can be contained in S. ■ 



5 Fast Algorithm 

We now describe a fast algorithm for the recovery of sparse non-negative vectors from noiseless 
measurements. This algorithm relies on the minimal expansion we described in section 13.41 We 
employ a {kd + 1,1 — ^) expander and perturb it as Lemma (j3.4p to obtain a sparse nonnegative 
matrix A with Cr(A) > kd + 1. Knowing that the target signal is at most fc-sparse the algorithm 
works as follows 

Algorithm 1. Reverse Expansion Recovery 

1. Find zero entries ofy and denote them by yi. Also denote by Ti the index set of elements of 

yi 



yi my, and by T2 its complement. Wlog assume that y 



y2 



2. Locate in X the neighbors of the set of nodes in Y corresponding to Ti, name the set S\ and 
name the set of their complement nodes in X by 82- 

3. Identify the sub-matrix of A that represents the nodes emanating from S2 to T2 . Call this 
sub-matrix A2. Columns 0/ A2 correspond to nodes in S2, and its rows correspond to the 
nodes in T2. 

4- Set = and compute = A2y2, where A^ is the pseudo-inverse of A defined by 
At = (A*A)~-'^A*. Declare x as the output. 

The algorithm begins with identifying a big zero portion of the output and locating their 
corresponding nodes in Y (Refer to Figured] in Appendix [Cj . In the next step, neighbors of these 
nodes are found in X and these two giant sets of nodes are eliminated from X and Y. Having 
done this, we are left with a much smaller system of linear equations, which turns out to be 
over- determined, and therefore our problem reduces to solving linear equations. The theoretical 
justification of the above statement, and why the algorithm works, is provided in Appendix O 
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5.1 Noisy Case, Robust Algorithm 



In general, the observations vector is contaminated with measurement noise, usually of significantly 
smaller power, leading to the following equation: 

y = Ax + v. (15) 

As before, it is assumed in (jlSp that x is sparse, v is the n x 1 observation noise vector with 
limited ^i-norm. In practice v is often characterized by its £2 norm. However, in order to establish 
a recovery scheme that is robust to a limited power noise, we need to have a measurement matrix 
with a 2-RIP. This is not in general true for (0, 1) expander graphs, although it is realizable via a 
suitable perturbation for this class of matrices. However, for the scope of this paper, we assume 
that the limitation on the noise is given through its ^i-norm. This allows the of use the 1-RIP 
bounds of [6j to analyze the performance of our scheme in the presence of noise. Again we are 
assuming that x is /c-sparse 



Algorithm 2. Noisy Case 



its complement. Wlog assume that y 



1. Sort elements of y in terms of absolute value, pick the smallest m — kd ones and stack them 
in a vector denoted by yi. Also denote by Ti the index set of elements of yi in y, and by T2 

yi 
y2 

2. Locate in X the neighbors of the set of nodes in Y corresponding to Ti, name them by Si and 
name the set of their complement nodes in X by S2- 

3. Identify the sub-matrix of A that represents the nodes emanating from S2 to T2 referred to as 
A2 . Columns of A2 correspond to nodes in S2 , and its rows stand for the nodes in T2 . 

4- Set X5j = and — ^'"ff"^*'^zgMl*2lxi ll-^2Z — y2||i cLn-d declare x as the output. 

We show in the following theorem that our algorithm is robust (in a £i-norm sense) to the 
observation noise. 

Theorem 5.1. If A is the adjacency matrix of a (k,e)- expander with e < 0.5, x is a k-sparse 
nonnegative vector and x is the output of Algorithmic then ||x — x||i < frlf ||v||i 

Proof. Given in Appendix [El ■ 



6 Experimental Evaluation 

We generated a random m x n matrix A with n = 2m = 500, and d = 3 Vs in each column. 
We then multiplied random sparse vectors with different sparsity levels by A, and tried recovering 
them via the linear programming ([2]). Next, we added the perturbations described in section [3] to 
A and applied the same sparse vectors to compare the recovery percentages in the two cases. This 
process was repeated for a few generations of A and the best of the improvements we obtained is 
illustrated in Figure [2al 

In Figure [2bl we plot the recovery percentage of Algorithm [1] for a random perturbed expander 
adjacency matrix A of size 250 x 500 and d = 6, and we have compared the performance with the 
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(a) ^i-minimization on expanders and perturbed 
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Figure 2: Percentage of successful recovery for £i -minimization and Algorithm [T] with minimal expander measurements and 
reconstruction error of Algorithm [2l 

£i-minimization method. Although the deterministic theoretical bounds of the two methods are the 
same, as observed in Figure [2bl in practice £i-minimization is more effective for less sparse signals. 
However Algorithm [1] is considerably faster than linear programming and easier to implement. 

In general, the complexity of Algorithm [1] is 0(n/c^) which, when k is proportional to n, is similar 
to linear programming O(n^). However the constants are much smaller, which is of practical 
advantage. Furthermore, taking advantage of fast matrix inversion algorithms for very sparse 
matrices. Algorithm [1] can be performed in dramatically less operations. Figure [3] shows the Signal 
to Error Ratio as a function of Signal to Noise Ratio when Algorithm [2] (with ^2-iiorm used in step 
4) has been used to recover noisy observations. Assuming that the output of the algorithm is x. 
Signal to Noise Ratio (SNR) and Signal to Error Ratio functions are defined as 



5iVi? = lOlog 



SER= 10 log 



lAxlli 



x ■ 



X 



(16) 
(17) 



Measurement matrices are the same as before. 



7 Conclusion 



In this paper we considered the recovery of a non-negative sparse vector using a sparse measure- 
ment matrix in the compressed sensing framework. We used the perturbed adjacency matrix of a 
bipartite expander graph to construct the sparse measurement matrix and proposed a novel fast 
algorithm. We computed recovery thresholds and showed that for measurement matrices with non- 
negative entries and constant column sum the constraint set {x|x > 0, Ax = y} is unique whenever 
ii optimization is successful (which also means that any other optimization scheme would be suc- 
cessful). Finally, determining whether the matrices constructed satisfy an 2-RIP (we conjecture 
that they do), and constructing (0, 1) matrices that have complete rank proportional to n are open 
problems that may be worthy of further scrutiny. 
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A Proof of Theorem 14.11 

Assuming that we generate a random matrix A by randomly generating its columns, it suffices to 
show that the probability that A has the desired expansion property is positive. For 1 < ii < 
12 < ■■■ < ir < n We denote by Ei^^i^^^^^^i^ the event that the columns of A corresponding to the 
numbers zi,Z2,.-->V have at least n — r — 1 entire rows (rows that does not have a single non-zero 
elements in the columns Aj^, Ajjv-A-jj.). In other words -E'ii^ij,...,!^ is the event that the set of nodes 
{zi, ^2; •••) v} in X contracts in Y. 



P[A is a (ro, ^^-^)-Exp.] = 1 - P[A not a (ro,^^-^-Exp.)] 

d<r<ro,l<ii<i2 <■■■<*!■ 



A combinatorial analysis yields the following: 



P[i?l,2,...,r] < 

\d) 



Hence 



m is a (ro,^)-Exp.] > 1 " E (i^J (^-18) 

The objective is to show that this probability is positive. Equivalently, we show that for certain 
regimes of /?, fj, and d, the summation on R.H.S of (jA.lSP vanishes. To this end, we split the sum 
into a sub-linear summation and a summation. We show that if d > 2, the sub-linear part will 
decays polynomially as n — > oo, and the linear part decays exponentially in n, provided a certain 
relationship involving and d holds. We state this in two different theorems. 

Theorem A.l. 7/0 < a < andd>3 then J2r=d O (r}\' = 0{n^-'^^'^-^^). 

\d) 

Proof. We can write: 
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r=d ^d-J r=d 

an 



rim 



r r m 

r=d 
an 

^-^ n 

r=d 

Where c = e'^ . ()A.20p and (|A.2ip are deduced from the bounds < (^)'^ for r < m, and 

(^) ^ ("y)'^ respectively. It is easy to show that when a < {^Y is decreasing in r, and thus 
replacing all the terms in ()A.2ip by the first term will only increase the sum. The whole term is 
thus smaller than an{^)'^^'^^'^^ = Xn^~'^^'^~'^^ for some positive constant A. ■ 

Theorem A. 2. For m = (3n and tq = /un, if d > — ^^^"^^/a!'^^ then for any < a < fj,, the sum 
Yjr=an+i (") ^ "^1^ decays exponentially as n — > oo. 

Proof. Using the standard bounds of ()F.49p on binomial coefficients we can write: 

r=an+l ^ ^ (rf) r=an+l 

where H{x) = xlog(^) + (l — x) log(Y^) is the entropy function. Using i?(e) = e(log(i) + l)+0(e^) 
for small e and the fact that as n — > oo ^ ^ and ^ ^ for an < r < jjLn, (|F.49p can be written 
as: 



(m\ /r\r ^m 

n^)+rdlog, A+o(l) 



r=an+l ^ \d) r=an+l 

= 0(n32"(^('')+^^(l)+^'^'°s(f))) (A.23) 



()A.23P vanishes if d > 



The above argument leads to Theorem 14. 1[ 



B Derivation of the Weak Bound 

We start with a straightforward modification of Theorem I3.lt 

Theorem B.l. let xq € (M"*")" he fixed, y = Axq and denote by S the support of xq. Then the 
solution X o/ will be identical to xq if and only if there exists no w in the null space of A such 
that w^c is nonnegative and X^^Li '^i > 0. 
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Proof. Similar to the proof of Theorem 13.11 with S as the support of xq ■ 



Corollary 1. If A is the adjacency matrix of a bipartite graph with left constant degree, and i/xg 
is a fixed nonnegative vector and y = Axq, then the solution x of will be identical to xq if and 
only if there exists no w in the null space of A so that is a nonnegative vector, where S is the 
support set o/xq. In other words xq will be recoverable via L.P from Axq provided the support of 
Xq does not include the index set of all negative elements of a vector in the null-space of A. 

Proof. Directly from Theorem IB. II and Lemma [ 



This last statement allows us to derive a combinatorial matching condition for the recovery of 
a vector supported on a specific subset S. We repeat the statement of the lemma: 

Lemma B.l. Given a set S consider T{S) and denote S2 = r(r(S')) \ S. Let the bipartite two-hop 
graph of S be denoted by Bs = {S L) S2,T{S U S2)). Any non-negative vector xq supported on S 
can be recovered from y = Axq, if every subset S' C S U S2 of size \S'\ < |r(S')| + 1 has minimal 
expansion: |r(S")| > 

Observe that the expectation is (asymptotically) Er(S') = (1 — e~'^~)f3n =: 7in Using a 
standard Chernoff bound [21] it is easy to show that T{S) is concentrated around its expectation: 

p[r(S) <=Er(5) + ei] > 1--, 

n 

if \S\ > cin, for ei = C2\/n logn. Therefore we define the event Ei = {r(5) <= 7in + ei}. 

Consider the random graph created from placing d non-zero entries (with repetition) in every 
column of A. From the set 5, form r(5'), the corresponding 5*2, and finally the bipartite graph 
Bs = {SUS2, T{SUS2)). Using the given combinatorial condition, we can recover a signal supported 
on S if every subset Si C S U S2 of size \r\ < |r(5)| + 1 has sufficient expansion: |r(5j)| > r (note 
that subsequently we drop the +1 term since it is negligible for large n). First we condition on the 
concentration of r(5): 

F[S not recoverable] = P[5 not recoverable|Si]P[^i] + F[S not recoverable |£;j]P[SJ] (B.24) 

< F[S not recoverable I Sil(l --) + -, (B.25) 

n n 

therefore it suffices to bound the probability conditioned on T{S) concentrated. We are going 
to do a union bound over all possible selections of ri nodes in S and r2 nodes in 52 so that 
rl + r2 < T(S) + ei. Since we are conditioning on Ei it suffices to have rl + r2 < ¥.^in. The 
second problem is that the set S2 is random and dependent on r(S'). We are going to avoid this 
conditioning by allowing the choice of r2 to range over all the nodes n — k nodes in S^. 

P[5' not recoverable|£;i] < ^ ( ^ J ( ~ ^ j P(rl, r2 contracts|£;i). (B.26) 

ri+r2<-yin ^^^^ \ '^2 / 

Now the problem is that conditioning on Ei implies that the set ri does not expand too much, 
so it is actually increasing the probability of the bad contraction event. We can however easily 
show that this increase is at most a factor of 2: 

^, , P(rl,r2 contracts n E'l) P(rl,r2 contracts) ,^ 
P(rl,r2 contractsLBi = ^;:7TT^ < s -■ B.27) 
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Now since > 1 — for sufficiently large n, 1/P(£'i) < 2, so 

P(rl,r2 contractsl^i) < 2P(rl,r2 contracts). (B.28) 

The probability that the set rl, r2 contracts can be further bounded by assuming r(rl, r2) = rl+r2 
(any smaller neighborhood will have smaller probability) so 

P rl,r2 contracts < — 

yri + r-2/ \ m J 

Putting everything together we obtain the bound 

F[S not recoverableli^i] <2 V ( k\ - f") ( ) ( !1±I^^ "^''""''^ . (b.29) 

^ V^'i/ \ f2 J \ri +r2j \ m J 

We move everything to the exponent and use standard binomial approximations to obtain 

P[5 not recoverablel^i] < 2 2'=^(*)+('^-^)^(^)+™^('^)+'^('-^+^^)i°^('^). (B.30) 

r\-\-r2<n\n 

Recall that the recoverable fraction is A; = an, ra = /3n, and denote p\ = ri/n, p2 = r2/n. 
Define the function 

FipuP2) ■■= aHi^) + (1 - a)Hi-P^) + PHi^^±^) + d{p, + P2) log(^^), (B.31) 
a 1 — a p p 

and observe that the bound on the probability of failure (|B.30P becomes 

W[S not recoverable I < 2 ^ 2'^^^p^^p^\ 

Therefore for a fixed /?, 71 we are trying to find the largest a* that makes F{pi,p2) negative for 
every pi, P2 for which pi + P2 ^ Ji- For this a*, we can recover with exponentially high probability 
conditioned on the sublinear sets do not contract (which has been already established). 

C Proof of the Validity of Algorithm [1] 

As mentioned earlier and illustrated in Figure [H the algorithm begins by identifying a big zero 
portion of the output and eliminating two large sets of nodes from X and Y. Having done this, we 
are left with a much smaller system of linear equation, which turns out to be an over-determined 
system and can be uniquely solved using matrix inversions. The theoretical justification of the 
above statement, and why the algorithm works, is provided in Appendix [Cj The fact that it is 
over-determined is secured by the expansion property of the measurement matrix and its propor- 
tional complete rank. Figure H] is a graphical illustration of how Algorithrr[T] decomposes X and 
Y into subsets and makes the search and observation spaces shrink from X and Y into ^2 and T2 
respectively. Intuitively speaking, when we isolate a big proportion of the nodes on the right, their 
neighbors in X are big enough to leave us with a set of nodes on the left that are less than the 



20 



S2 



S1 




T2 



T1 



Figure 4: Decomposition of nodes and edges by Algorithm [TJ 



remaining nodes on the right. This procedure is therefore nothing but a block diagonalization (by 
rearranging rows and columns)of A into a lower triangular matrix: 




(C.32) 



where A2 is a square or tall full rank matrix. The following theorem certifies that Algorithrr[T] is 
indeed valid and it recovers any fc-sparse vector without error. 

Theorem C.l. Validity of Algorithm 

If ji. is a k-sparse non-negative vector and A is a perturbed (kd + 1,1 — 2) expander with 
Cr(A) >kd+l then: 

1. y is kd-sparse and therefore has at least m — kd zeros. 

2. |5'2| < |r2| and A2 is full rank. 



1. Trivial by noting the fact that the graph representation of A is left d-regular. 

2. Suppose |S'2| > |Ti| = kd. select an arbitrary subset of ^2' ^ S2 of size kd+ 1. Because of the 
expansion property: |r(S'2')| >kd+l> \Ti\. But r(S'2') is in T2 and this is a contradiction. 
Diagonalization of ()C.32p and the fact that A2 is a tall matrix and Cr(A) > kd+ 1 together 
imply that A2 has full column rank. 

3. Every node in Y is an equation with nonnegative variables from x, and positive weights from 
edges of the graph. If any entry in x^^ is greater than zero, then the equations corresponding to 
its neighbors in Ti are not zero (and there is at least one such equation since Si = T{Ti)). This 
is in contradiction with the choice of Ti. Therefore x^^ = = x^^. Also since A2XS2 = y2 
and A2X52 = y2 and A2 is full rank we conclude that xs^ = x^j . 



5. X = X 



Proof. 
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Remark 2. Note that in proving the last part of Theorem \C . 1\ we used the fact that x is a non- 
negative vector. This proof does not hold for general sparse vectors. 

Remark 3. By proving the part 2 of I C. 1\ we implicitly proved that every expander from X to Y is 
a contractor from Y to X . This can be generalized as following : 

VT c |T| < rd + 1 ^ |r(r)| < |T|. (c.33) 



D 1-RIP for Expanders 

We present a simple argument that the perturbed matrix A has the 1-RIP property if the underlying 
graph is an (fc, e), for e < 1/2. The 1-RIP property states that for every /c-sparse vector x and 
suitable constants ci, C2, the ii norm of ||Ax||i is close to the norm of x: 

(1 - ci)||x||i < ||Ax||i < (1 + C2)||x||i. (D.34) 

Berinde et al. [6] already show that adjacency matrices of expander graphs will have this property, 
also for p norms where p <1 — 1/logn. The argument we present here also requires e < 1/2, but 
is arguably simpler and easily extends to the perturbed case. 

Consider A to be the perturbed adjacency matrix of a (A;,e) unbalanced expander for e < 1/2 
and each nonzero entry is in [1 — ei, 1+ei]. Consider S, the support set of x. By Hall's theorem since 
every set S of size of size up to k has d{l — e)\S\ neighbors, there must exist a d{l — e)-matching, 
i.e. every node in S can be matched to d{l — e) left nodes. Decompose the measurement matrix 

A = Am + Ac7. (D.35) 

Where Am is supported on the d{l — e)-matching (i.e every row has one non-zero entry and every 
column has d{l — e) non-zero entries). The remainder matrix Ac has ed non-zero entries in each 
column, and notice that the decomposition is adapted to the support of the vector x. By the 
triangle inequality: 

||Ax||i > ||Aa/x||i - ||Acx||i. (D.36) 

It is easy to see that 

||Aa/x||i > (l-ei)ci(l-e)||x||i, (D.37) 

since A^x is a vector that contains d{l — e) copies of each entry of x multiplied by coefficients in 
[1 — ei, 1 -|- ei]. Also since each column of ||Ac||i contains ed non-zero entries, 

||Acx||i < (l + ei)ed||x||i, (D.38) 

since each entry of Ajv/x is a summation of terms of x and ||AAfx|| is also a summation in which 
each entry of x appears de times, multiplied by coefficients in [1 — ei, 1 + ei]. The same argument 
yields the upper bound: 

||Ax||i < d||x||i (D.39) 
Therefore, putting these together we obtain: 

(1 - ei - 2e)||x||i < ||Ax||i < d||x||i. (D.40) 

Therefore we need an expander with e > (1 — ei)/2 which for arbitrarily small perturbations goes 
to the e < 1/2 limit. 
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E Proof of Robustness of Algorithm [2] 

We first state the following lemma from [6]: 

Lemma E.l. Consequence of Lemma 9 of If A is the adjacency matrix of a (k,e)- expander 
with e < 0.5 and u is a k-sparse vector, then d{l — 2e)||u||i < ||Au|| < (i||u||i. 

We have presented a new proof of this Lemma in the Appendix[Dl which is based on Generalized 
Hall's Theorem. That proof allows us to state with certainty that with suitable perturbations, A 
will also have a 1-RIP. 

By rearranging the rows and columns of A, we may assume x 



Xl 




yi 




Vl 




, y = 


, V = 




^2 




. y2 . 




. _ 



and A 



All 
Ai2 Aa 



, where yi and y2 are those obtained by the algorithm, xi = xs^ and 



(E.41) 
(E.42) 



X2 = ■ Also let e = X — X be the reconstruction error vector. By (jlSp we then have 

yi = Aiixi + Vl 

y2 = A12X1 + A2X2 + V2 
Xl 

_ X2 - X2 

Hence we have: 

||xi||i < ||AiiXi||i = ||yi - Villi < llyilli + ||vi||i < 2||vi| 



(E.43) 



(E.44) 



The first inequality holds as a result of nonnegativity of xi and An, and the fact that every column 
of All has at least 1 nonzero entry. The last inequality is a consequence of the choice of yi in step 
1 of the algorithm and the fact that Ax is m — rd sparse. Let's assume y2 = A2X2 + 62- From the 
way X2 is driven in step 4 of the algorithm it follows that: 

||<52||i < IIA12X1 + V2II1 (E.45) 

And thus 

||A2(x2 - X2)||i = \\62 - A12X1 - V2II1 < 2||Ai2Xi + V2II1 < 2d||xi||i + 2||v2||i (E.46) 



Using this along with the 1-RIP condition of Lemma IE. II for the sparse vector u 
get: 

ci||x2 - X2II1 < 2(i||xi||i + 2||v2||i 
Where ci = (1 - 2e)d. Equations (|E.44p and (|E.47j) result in: 

„ „ . 4(i,„ „ 2 „ „ 6-4e„ „ 

e 1 < 2 + — Vl i + — V2 1 < r- v 1 

ci ci 1 — 2e 





X2 - X2 



we 



(E.47) 



(E.48) 



Therefore we have been able to bound the ii norm of the error with a constant factor of £1 norm 
of noise as desired. 
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Remark 4. As soon as a p-RIP can be proven to hold for A, a similar statement bounding ip 
norm of error with that of noise can be authentically established. However step 4 of the algorithm 
needs to be revised and t\ optimization should be replaced with ip. In particular p = 2 is of practical 
interest and the I2 optimization of step 4 is equivalent to the pseudo-inverse multiplication of the 
noise-less algorithm. However, as we mentioned earlier 2-RIP does not hold for 0-1 matrices. We 
speculate that a suitable perturbation can force singular values of A to jump above a constant, and 
thus gift A a 2-RIP condition. 



F Elementary bounds on binomial coefficients 



For each (3 G 
H{Q) = H{1) -- 
coefficients 



(0, 1), define the binomial entropy H{(3) 



~/31og2/3 - (1 - /3)Iog2(l - /3) (and 



by continuity) . We make use of the following standard bounds on the binomial 



n 



JJ^k\_ log2(n + l) 
n J n 



< log2 



n 



< n 



^,k\ log2(n + 1) 
n J n 



(F.49) 
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